Partitioning of Switches and Fabrics into Logical Switches and Fabrics

ABSTRACT

A Layer  2  network switch is partitionable into a plurality of switch fabrics. The single-chassis switch is partitionable into a plurality of logical switches, each associated with one of the virtual fabrics. The logical switches behave as complete and self-contained switches. A logical switch fabric can span multiple single-chassis switch chassis. Logical switches are connected by inter-switch links that can be either dedicated single-chassis links or logical links. An extended inter-switch link can be used to transport traffic for one or more logical inter-switch links. Physical ports of the chassis are assigned to logical switches and are managed by the logical switch. Legacy switches that are not partitionable into logical switches can serve as transit switches between two logical switches.

TECHNICAL FIELD

The present invention relates to the field of network fabricvirtualization and in particular to virtualization of network fabricsthrough virtualization of switches.

BACKGROUND ART

Switch-based network fabrics have been a major part of the developmentof storage area networks (SANs) in modern networking environments.Scalability of large Layer 2 (L2) fabrics has become a problem, as endusers require ever-larger L2 fabrics, while also desiring consolidationof SAN islands and better tools for managing increasingly more complexSANs and other switch-based fabrics.

SUMMARY OF INVENTION

In accordance with one embodiment, a method of managing a network switchcomprises partitioning a first network switch into a first plurality oflogical switches, and managing each of the plurality of logical switchesindependent of each other of the plurality of the first plurality oflogical switches. In accordance with a second embodiment, a method ofpartitioning network switches comprises partitioning the first networkswitch into a first plurality of virtual switch fabrics, partitioningthe first network switch into a first plurality of logical switches; andassociating the first logical switch with a first virtual fabric of thefirst plurality of virtual switch fabrics.

In accordance with another embodiment, a network switch comprises aswitch, partitionable into a plurality of logical switches, wherein eachof the plurality of logical switches is a complete and self-containednetwork switch, a processor, a storage medium, connected to theprocessor, a chassis management system, stored on the storage medium,wherein the chassis management system, when executed by the processor,causes the processor to perform actions that are associated with theswitch as a whole, and a logical switch management system, stored on thestorage medium, wherein the logical switch management system, whenexecuted by the processor, causes the processor to perform actionsassociated with any of the plurality of logical switches.

In accordance with yet another embodiment, a computer readable mediumstores software for partitioning a network switch, the software forinstructing a processor of the network switch to perform actionscomprising partitioning the network switch into a first plurality oflogical switches, and managing each of the plurality of logical switchesindependent of each other of the plurality of the first plurality oflogical switches.

In accordance with yet another embodiment, a network comprises aplurality of external devices, a plurality of chassis, each comprising asingle-chassis fabric, and a switch configured for use with the fabric,a first multi-chassis virtual fabric coupling the plurality of externaldevices, wherein the first multi-chassis virtual fabric comprises afirst virtual single-chassis fabric to which are coupled a first portionof the plurality of external devices, the first virtual single-chassisfabric selected from a plurality of virtual fabrics configured from thesingle-chassis fabric of a first chassis of the plurality of chassis,and a second virtual single-chassis fabric to which are coupled a secondportion of the plurality of external devices, the second virtualsingle-chassis fabric selected from a plurality of virtual fabricsconfigured from the single-chassis fabric of a second chassis of theplurality of chassis, and a software stored on a storage medium of eachof the plurality of chassis, the software for instructing a processor ofthe corresponding chassis to perform actions comprising partitioning thesingle-chassis fabric of the chassis into a plurality of virtualsingle-chassis fabrics, associating a virtual single-chassis fabric ofthe plurality of virtual single-chassis fabrics with the multi-chassisvirtual fabric, partitioning the switch into a plurality of logicalswitches, and assigning a first logical switch of the plurality oflogical switches to the multi-chassis virtual fabric.

BRIEF DESCRIPTION OF DRAWINGS

The accompanying drawings, which are incorporated in and constitute apart of this specification, illustrate an implementation of apparatusand methods consistent with the present invention and, together with thedetailed description, serve to explain advantages and principlesconsistent with the invention. In the drawings,

FIG. 1 is a block diagram illustrating a high-level of one embodiment ofpartitioning a chassis into logical switches;

FIG. 2 is a block diagram illustrating an example of a plurality oflogical switches organized into a plurality of virtual fabrics;

FIG. 3 is a block diagram illustrating another example of a plurality oflogical switches organized into a plurality of virtual fabrics;

FIG. 4 is a block diagram illustrating inter-switch links between thelogical switches of FIG. 3;

FIG. 5 is a block diagram illustrating one use of partitioning a chassisinto logical switches;

FIG. 6 is a block diagram illustrating one use of partitioning a fabricinto virtual fabrics;

FIG. 7 is a block diagram illustrating a virtual fabric Meta StorageArea Network (Meta SAN);

FIG. 8 is a block diagram illustrating using dedicated inter-switchlinks to connect logical switches in a two-chassis embodiment;

FIG. 9 is a block diagram illustrating using logical inter-switch linksto connect logical switches in a two-chassis embodiment;

FIG. 10 is a block diagram illustrating a logical topology connectinglogical switches in a multi-chassis embodiment;

FIG. 11 is a block diagram illustrating a virtual fabric composed ofmultiple chassis and a legacy L2 fabric according to one embodiment;

FIG. 12 is a block diagram illustrating an example of using a basefabric containing long distance to connect virtual fabrics;

FIG. 13 is a block diagram illustrating a high-level architecture forpartitioning a chassis according to one embodiment;

FIG. 14 is a block diagram illustrating a hierarchy of logical switches,logical interfaces, physical interfaces, and ports according to oneembodiment;

FIG. 15 is a block diagram illustrating frame encapsulation fortransmission across a logical inter-switch link between two logicalswitches according to one embodiment;

FIG. 16 is a block diagram illustrating a high-level softwarearchitecture for partitioning a chassis into virtual fabrics accordingto one embodiment;

FIG. 17 is a block diagram illustrating one embodiment of control pathlayering for partitioning a chassis into virtual fabrics according tothe embodiment of FIG. 16;

FIG. 18 is a block diagram illustrating one embodiment of data pathlayering for partitioning a chassis into virtual fabrics according tothe embodiment of FIG. 16;

FIG. 19 is a block diagram illustrating one embodiment of a logicalfabric manager according to one embodiment;

FIG. 20 is a flowchart illustrating one embodiment of a transmit flowpath for a frame received on a logical port;

FIG. 21 is a flowchart illustrating one embodiment of a receive path fora frame received on a logical port;

FIG. 22 is a flowchart illustrating one embodiment of a technique forencapsulating a frame traveling across a logical inter-switch link(LISL);

FIG. 23 is a flowchart illustrating one embodiment of a technique fordecapsulating a frame traveling across an LISL;

FIG. 24 is a block diagram illustrating one example of connectingmultiple chassis using a virtual fabric;

FIG. 25 is a block diagram illustrating one embodiment of frame headerprocessing as a frame traverses the virtual fabric of FIG. 24;

FIG. 26 is a block diagram illustrating another example of connectingmultiple chassis using a virtual fabric;

FIG. 27 is a block diagram illustrating one embodiment of frame headerprocessing as a frame traverses the virtual fabric of FIG. 30;

FIG. 28 is a block diagram illustrating an example frame flow for anoutbound logical F port;

FIG. 29 is a block diagram illustrating an example frame flow for aninbound logical F port;

FIG. 30 is a block diagram illustrating an example state machine used inone embodiment for the creation of a logical inter-switch link;

FIG. 31 is a block diagram illustrating a network of switch chassis andend-user devices according to one embodiment;

FIG. 32 is a block diagram illustrating the network of FIG. 31 with theswitch chassis partitioned into logical switches according to oneembodiment;

FIG. 33 is a block diagram illustrating the network of FIG. 31 with theassignment of logical switches to virtual fabrics and inter-switch linksconnecting the virtual fabric;

FIG. 34 is a block diagram illustrating a hardware implementation forpartitioning a network switch into multiple logical switches accordingto one embodiment; and

FIG. 35 is a block diagram illustrating a header processing unit of theembodiment of FIG. 34.

The figures depict embodiments of the present invention for purposes ofillustration only. One skilled in the art will readily recognize fromthe following discussion that alternative embodiments of the structuresand methods illustrated herein may be employed without departing fromthe principles of the invention described herein.

DESCRIPTION OF EMBODIMENTS

In the following description, for purposes of explanation, numerousspecific details are set forth in order to provide a thoroughunderstanding of the invention. It will be apparent, however, to oneskilled in the art that the invention can be practiced without thesespecific details. In other instances, structure and devices are shown inblock diagram form in order to avoid obscuring the invention. Referencesto numbers without subscripts are understood to reference all instanceof subscripts corresponding to the referenced number. Moreover, thelanguage used in this disclosure has been principally selected forreadability and instructional purposes, and may not have been selectedto delineate or circumscribe the inventive subject matter, resort to theclaims being necessary to determine such inventive subject matter.

Although some of the following description is written in terms thatrelate to software or firmware, embodiments can implement the featuresand functionality described herein in software, firmware, or hardware asdesired, including any combination of software, firmware, and hardware.References to daemons, drivers, engines, modules, or routines should notbe considered as suggesting a limitation of the embodiment to any typeof implementation.

FIG. 1 illustrates one example of partitioning a switch in a singlechassis into multiple logical switches. Although the followingdescription is set forth in the context of a Fibre Channel (FC) switchchassis, the present invention is not limited to Fibre Channeltechnology and could be implemented in other types of switched-basedfabrics. Furthermore, “fiber” is used throughout this description as ageneric term that can indicate either an optical or a copper cable.

Chassis 100 is an embodiment of a Fibre Channel switch chassis. In adefault configuration, the entire switch can be considered as a singlelogical switch 110. According to the embodiments described herein, theswitch of chassis 100 can be partitioned into multiple logical switches,illustrated in FIG. 1 as logical switches 120, 130, and 140. Althoughthis example and many of the following examples of partitioning showpartitioning a switch into three logical switches, the cardinality ofthe partitioning is illustrative only and limited to a small number oflogical switches for clarity of the drawings.

Each logical switch 120, 130, and 140 acts as a single Fibre Channelswitch, with a collection of zero or more user visible ports. Eachlogical switch 120, 130, and 140 can support at least E, F, and FLports, as those port types are defined by the Fibre Channel standards.Each logical switch 120, 130, and 140 behaves as a complete andself-contained FC switch, with fabric services, configuration, and allfabric characteristics associated with a physical FC switch. Logicalswitch 120 is designated in FIG. 1 as a default logical switch. In oneembodiment, all switch ports not assigned to other logical switches,such as logical switches 130 and 140, are assigned to the defaultlogical switch 120. If the chassis 100 is configured with only onelogical switch 110, then logical switch 110 is considered the defaultlogical switch and all ports are considered part of logical switch 110.

Management of chassis 100 is performed as management of a collection oflogical switches, whether there is only one logical switch 110 or aplurality of logical switches 120, 130, and 140. Some chassis managementfunctions, for example, the partition configuration management, spanlogical switch boundaries, but users can separately manage logicalswitches independently.

In addition to partitioning a chassis into logical switches, the logicalswitches are assigned to virtual fabrics, also known as logical fabrics.In one embodiment, each logical switch is assigned to a differentvirtual fabric, and only one logical switch can be associated with avirtual fabric in a particular chassis. A virtual fabric can be asingle-chassis virtual fabric, or can span multiple chassis, whichallows creating multi-chassis virtual fabrics comprised of logicalswitches in different chassis. In the following disclosure, referencesto a fabric should be understood as a reference to a virtual fabricunless otherwise stated.

Embodiments of chassis management functions related to partitioning thechassis into virtual switches include the ability to create a logicalswitch, assigning the logical switch to a virtual fabric, adding portsto the logical switch, deleting ports from the logical switch, deletingthe logical switch, and changing the assignment of the logical switch toa different virtual fabric. In some embodiments, security constraintscan be placed on the chassis management functions, such as requiringpermission to effect any chassis management operations. Additionally,users can be given rights to control one virtual fabric in a chassis,but not another.

Physical ports on the chassis are assigned to logical switches. Chassismanagement functions allow moving ports between logical switches in oneembodiment, forcing a port offline when moved from one logical switch toanother. In one embodiment, a logical switch with zero ports assigned toit is automatically deleted.

Because physical ports are assigned to logical switches, the concept ofa user port is introduced. A user port is a port assigned to a logicalswitch and bound to a physical port. Each logical switch has its ownport index, but unlike a conventional switch without logical switches,the port index values are associated with a user port number, anddepending on the configuration of the chassis, may not be the same asthe physical port number. FC addresses include the user port number andare dynamically allocated when a port is assigned to a logical switch.In one embodiment, FC addresses are not unique across logical switches,because user port numbers are not unique across logical switches. In oneembodiment, physical and user port numbers within a chassis do notchange, regardless of the logical switch to which the port is assigned.Therefore, when a port is moved from one logical switch to another, bothphysical and user port numbers stay unchanged. In that embodiment, theport indexes are assigned at the time of being added to a logical switchand are assigned sequentially. When a port is removed from the logicalswitch, the port index slot becomes free.

Returning to FIG. 1, the example physical switch 100 is illustrated witheight physical ports, designated P1 through P8. In embodiments of such aswitch, the physical ports P1 through P8 are typically implemented onone or more edge switch ASICs of the physical switch 100 that areinternally connected to core switches for intra-chassis data traffic.The edge switches are not managed independently and are transparent toany external devices connected to the switch 100, so the division intoedge switches is not significant for the purpose of this application andis not shown in FIG. 1. For purposes of clarity of the example drawingof FIG. 1, only eight physical ports on the switch 100 are shown,although such switches typically have 16 or more ports, and may have anynumber of desired ports.

According to embodiments described below, the physical switch 100illustrated in FIG. 1 is partitioned into three logical switches 120,130, and 140, each of which is assigned to a different virtual fabric,illustrated as fabrics 1, 2, and 3 in FIG. 1, as indicated by the FabricIdentification (FID) value. Logical switch 120 is shown as having adifferent number of ports than either logical switch 130 or 140.Physical ports can be assigned to the logical switches 120, 130, and 140by configuring the logical switches. Although only three logicalswitches are shown in FIG. 1, actual implementations of the physicalswitch can typically be partitioned into other numbers of logicalswitches as desired by the switch operator.

As illustrated in FIG. 1 and described in more detail below, physicalswitch 100 is partitioned so that physical port P1 is assigned asphysical port PL1 to logical switch 120. Physical ports P2, P3, and P4are assigned to logical switch 120 as physical ports PL2, PL3, and PL4.Physical ports P5, P6, P7, and P8 are assigned to logical switches 130and 140. When external device 150 connects to port PL2 of logical switch120, it connects to the same physical port designated P2 in theunpartitioned switch 100, but the port is managed in logical switch 120by controlling the logical switch port PL2.

Similarly, physical ports P5 and P6 are assigned to logical switch 130as ports PL1 and PL2 of logical switch 130 and physical ports P7 and P8are assigned to logical switch 140 as ports PL1 and PL2. Because thelogical switches 120, 130, and 140 are managed independently, in someembodiments, each can have ports identified using the same port numbersas each other logical switch. As shown in FIG. 1, external device 160 islogically connected to port PL2 of logical switch 130 (the same portnumber as port PL2 of logical switch 120), which is port P6 of theunpartitioned physical switch 100.

As described below, the ports of logical switches 120, 130, and 140 areconnected to external devices or can be connected to ports of otherswitches of other chassis in the same virtual fabric throughinter-switch links, which can be dedicated physical links connectingphysical switch 100 to another physical switch, or logical links thatuse the services of other physical links to carry the traffic across thelogical link. The other chassis need not be capable of partitioning intological switches. Port PL3 of logical switch 130 is a logical port notdirectly associated with any physical port that is used for such logicallinks, with logical port PL3 connected via a logical link that traversesport PL2 of logical switch 140 and a physical link to switch 170 to alogical port of a logical switch of switch 170, as is described in moredetail below. The partitioning shown in FIG. 1, including port andfabric numbers, is by way of example and illustrative only and thephysical switch 100 can be configured with other partitioning asdesired.

FIG. 2 is an example of a collection of chassis 200 partitioned intothree virtual fabrics, with each virtual fabric spanning the collectionof chassis 200. In this example, logical switches 210, 250, and 270 areall part of a first virtual fabric, and are given an FID of 1. Logicalswitches 220, 240, and 280 are part of a second virtual fabric, and aregiven FID 2. Logical switches 230, 260, and 290 are part of a thirdvirtual fabric, and are assigned FID 3. Inter-switch links (ISLs) aredefined among the logical switches in each virtual fabric as isdescribed in more detail below. The fabric IDs illustrated in FIG. 2 andother figures discussed below are illustrative and only for example. Anydesired technique for identifying fabrics can be used.

A base fabric is a routable network that carries traffic for multiplevirtual fabrics. A base fabric is formed by connecting speciallydesignated logical switches from each chassis. These special logicalswitches are called base switches. ISLs within the base fabric arecalled eXtended ISLs (XISLs). XISLs are, by default, shared by allvirtual fabrics, although sharing can be limited to one or more fabricsto provide quality of service (QoS). Logical links created betweenlogical switches across the base fabric are called Logical ISLs (LISLs).LISLs represent reachability between logical switches across a basefabric and are not related to XISL topology. A base fabric can alsocontain legacy L2 switches since multi-fabric traffic is carried usingencapsulated headers, as discussed in more detail below.

ISLs connected to a physical port of a non-base switch are calledDedicated ISLs (DISLs). These DISLs are dedicated to a particularlogical switch and only carry traffic for a virtual fabric associatedwith the logical switch. In other word, E_ports associated with a baseswitch form XISLs, while E_ports associated with a non-base switch formDISLs. If an XISL is shared by one fabric, it still carries protocoltraffic associated with multiple fabrics, in addition to carrying datatraffic for just one fabric. In some embodiments, a base fabric can alsobe configured to have DISLs. For example, a non-base switch can be usedwithin a base fabric to connect two base switches. In such case, a DISLis carrying traffic within the base fabric, which is multi-fabric bynature.

Preferably, a base fabric is kept unburdened with unnecessaryconfiguration and protocols, so that the chance of segmenting ordisrupting the shared resource is minimized. Thus, in one embodiment,F_ports within a base fabric are prohibited. In other embodiments,F_ports can be in a base fabric as required for legacy configurationsupport and migration.

ISLs to link logical switches in a virtual fabric can be either directlinks between logical switches, or can be LISLs defined over XISLs. Inthe latter situation, illustrated in FIG. 3, logical switches 310 areconfigured in each chassis 300 as base logical switches. ISLs 320 aredefined to connect the base logical switches 310 into a single fabric,here given FID 255. The base logical switches 310 are logical switches,and can be, but do not have to be, the default logical switch for theirrespective chassis 300. The ISLs 320 are configured as XISLs, which canbe shared to carry traffic for multiple fabrics. Thus, the logicalswitches 330 that are assigned an FID of 1 in the example of FIG. 3would communicate with each other by routing traffic to the base logicalswitch 310, and then to other logical switches 330 across the XISLs 320using a logical link (not shown in FIG. 3) between the logical switches330 as disclosed below.

Alternately, logical switches in a virtual fabric can use DISLs tocommunicate directly with other logical switches in the same virtualfabric. FIG. 4 illustrates one example of such a configuration, withboth LISLs and DISLs. As in FIG. 3, XISLs 320 connect logical switches310 that are assigned FID 255. But now, LISLs 410 connect the logicalswitches 330 that are assigned FID 1, LISLs 420 connect the logicalswitches 340 that are assigned FID 300, and LISLs 430 connects thelogical switches 350 that are assigned FID 72. LISLs are a logicalrepresentation for a connection through a base fabric between twological switches in a virtual fabric. A LISL behaves like a regular Eport-connected ISL, allowing FC services over LISLs. Traffic for LISLs420, 420, and 430 all traverse the logical switches 310 and XISLs 320,which are shared between the virtual fabrics assigned FIDs 1, 300, and72.

As shown in FIG. 4, a virtual fabric can have both dedicated ISL (DISLs)and LISLs. For example, DISL 440 connects logical switches 350, inaddition to LISLs 430.

DISL 440 connects physical E_ports that are assigned to logical switches350, allowing an ISL for the exclusive use of traffic within the virtualfabric identified with FID 72. LISLs connect to logical switches throughlogical ports, such as the logical ports 450 that connect LISL 410 cbetween logical switches 330 b and 330 c.

FIGS. 5, 6, and 7 illustrate some advantages of using virtual fabrics asdisclosed herein. FIG. 5 illustrates consolidation of fabrics.Individual fabrics 500, 510, and 520 are consolidated into a Meta SANusing corresponding virtual fabrics 500 a, 510 a, and 520 a. FIG. 6illustrates that partitioning a single fabric 600 into three virtualfabrics 610, 620, and 630. Traffic in each of the virtual fabrics 610,620, and 630 is managed separately from and isolated from traffic ineach of the other virtual fabrics partitioned from fabric 600. FIG. 7illustrates the concept of creating a virtual fabric Meta SAN 700, bycombining single-chassis fabrics 710, 720, 730, and 740 into themulti-chassis virtual fabric 700. Each virtual fabric can growindependently subject only to chassis limitations, and selectedconnectivity can be established across virtual fabrics through routing.

FIGS. 8 and 9 illustrate two ways to connect virtual fabrics. In FIG. 8,DISLs 830 connect logical switches 800, 810, and 820 in differentchassis, using normal E ports. As illustrated in FIG. 8, legacy L2switches 840 that cannot be partitioned into logical switches andvirtual fabrics can be made part of a virtual fabric by connecting themwith DISLs to a virtual fabric. These are examples only, and othertechniques, including mixtures of LISLs and DISLs, can be used, asillustrated above in FIG. 4. FIG. 9 illustrates using LISLs 910 toconnect logical switches 930 in chassis 900 a with logical switches 940of chassis 900 b to form multi-chassis virtual fabrics, via base logicalswitches 960. Base logical switches 960 are connected with XISLs 970through another L2 fabric 980, which can be a legacy L2 switch and doesnot have to be capable of configuration with logical switches or virtualfabrics. In addition, legacy L2 switches 950 are connected to logicalswitches 930 and 940 using DISLs 920, creating multi-chassis virtualfabrics with the logical switches 930 and 940, even though legacy L2switches 950 are not virtual fabric-capable.

If two logical switches within a virtual fabric are reachable throughthe base fabric of that chassis, a LISL connects them together. LISLconnectivity is not affected by the underlying XISL configuration. Iftwo logical switches are no longer reachable through the base fabric,the LISL connecting the logical switches is taken offline. All logicalswitches within a virtual fabric are typically fully connected to oneanother in full mesh topology.

As shown above, DISLs can also be used addition to LISLs. For example,if there are three DISLs between two logical switches along with threeXISLs, four ISLs are seen from the L2 protocol perspective: three DISLsand one LISL. Some embodiments can provide support for only DISLs, whileother embodiments can provide also provide support for LISLs and XISLs,in addition to DISLs.

If a LISL needs to be brought up, a virtual fabric protocol firstallocates a logical port through the help of infrastructure andassociates the logical port with a logical switch. A port online eventis generated to the logical switch to start the process of bringing upthe LISL connected at the logical port. Logical switch operations on thelogical port and the logical ISL are performed without being aware ofthe logical nature of the entities. A LISL can be taken down by settingthe logical port in an offline state, or de-allocated explicitly by thevirtual fabric protocol.

FIG. 10 depicts conversion from physical connectivity to a virtualfabric topology. Five chassis 1000 are connected in the top picture withvirtual fabric 1010 built out of chassis 1000 a, 1000 b, 1000 d, 1000 e,and 1000 f. All links 1020 are XISLs. The bottom picture shows thevirtual fabric topology of the top picture where all logical switches1030 are connected in an all-to-all topology. Even though there are twoXISLs 1020 between chassis 1000 a and chassis 1000 b, there is only oneLISL 1040 between logical switch 1030 a logical switch 1030 b. In otherembodiments, multiple LISLs can connect two logical switches. In termsof hardware programming, chassis 1000 c, which is not part of thevirtual fabric 1010 does not require any route programming for virtualfabric 1010 and is not required to be virtual fabric capable.

In one embodiment, a chassis running a virtual fabric-capable FCprotocol can be configured to be one or more logical switches withoutthe need of enabling other virtual fabric features, such as devicesharing or virtual fabric. Each logical switch can be treated as normalL2 switches and can be used in a user environment just as legacy L2switches. If desired, the virtual fabric capability can be enabled toexpand the legacy L2 fabric.

FIG. 11 above illustrates an example of an expansion of a legacy L2fabric using virtual fabrics. Legacy L2 fabric 1110 can be used toconnect virtual fabric-capable chassis 1100, allowing the creation of avirtual fabric 1120 that includes legacy L2 fabric 1110. As before, newvirtual fabric 1120 can include DISLs 1140 to connect some of thelogical switches 1160 of the virtual fabric 1120, DISLs 1150 to connectwith the legacy L2 fabric 1110, and XISLs 1130 to connect logicalswitches 1170, allowing creation of LISLs (not shown in FIG. 11) toconnect logical switches 1160 in chassis 1100 c and 1100 d.

If a virtual fabric is created across long distance XISLs, thendisruption of the XISL would result in a virtual fabric wide disruption.Even if precautions to minimize disruptions are taken, mergingpreviously independent base fabrics across long distance links to createone large base fabric to share devices can pose several problems. Oneproblem is related to potential fabric ID conflict. Since thesepreviously independent base fabrics were configured independently, thesame fabric IDs may have been used to create virtual fabrics. In such acase, fabric IDs must be reconfigured to resolve the conflict and such aconfiguration change can be disruptive in some embodiments. Anotherproblem is related to disruption of routing within the base fabricduring fabric merge. Since these previously independent base fabricswere brought up independently, the same domain IDs may have beenassigned to the base logical switches. In such a case, domain IDs mustbe reconfigured to allow the merge and such operation can requiredisabling the affected base logical switches in some embodiments.Although these issues are typically one-time occurrences, potentialdisruptions can be severe.

In order to alleviate the problems, a hierarchical base fabricconfiguration can be used. A primary base fabric is used to createvirtual fabrics and share devices among them within a locality, while asecondary base fabric is used to share devices across long distance. Inother words, long distance XISLs are only present within secondary basefabric. Logical switches associated with the secondary base fabric arephysically connected to virtual fabrics associated with primary basefabrics to provide device sharing. Such configuration can be achieved byhaving separate chassis to create two separate base fabrics. In someembodiments, both primary and secondary base fabrics can coexist withina single chassis. In such a configuration, logical switches areassociated with either base fabric during creation.

FIG. 12 illustrates an example of a hierarchical base fabric topology.Instead of connecting base fabrics 1210 from site 1200 a and site 1200 bto create single base fabric across a long distance link between thesites, a separate base fabric 1240 is created and virtual fabrics 1250associated with the new base fabric 1240 are used to physically linkbetween the virtual fabrics 1220 and 1230. FIG. 12 is meant to portray ahigh-level fabric topology for such a hierarchical configuration,without detailing logical switches, etc. Such topology can be fulfilledusing separate chassis for all three base fabrics or a chassis in eachsite can contain both the primary base fabrics and part of the secondarybase fabric 1240. In such a hierarchical configuration, failure of thelong distance link in the base fabric 1240 would disconnect sites 1200 aand 1200 b from each other, but the virtual fabrics in those sites 1200would remain intact for local communication.

We now turn to a description of embodiments of an architecture that canprovide virtual fabrics as described above. FIG. 13 provides ahigh-level overview of one embodiment of such architectures. Element1310 provides support for virtual fabrics. Element 1320 provides supportfor FC device sharing. Element 1340 provides support for the baserouting infrastructure of the switching apparatus. Element 1350 providessupport for logical switches. In addition, element 1330 provides supportfor other switching applications. Together, these elements provide forpartitioning the switching apparatus into logical or virtual fabricssuch as have been disclosed above.

FIG. 14 illustrates one embodiment of an interface hierarchy 1400 thatallows for virtual fabrics. The interface hierarchy 1400 is a many tomany relationship. A single interface is created for every reachablebase fabric domain. A logical fabric manager (LFM) (discussed in moredetail below) runs a logical topology algorithm to determine the LISLsto be created, based on user configuration selections. For every LISLcreated by the LFM to a domain b, the LFM creates a logical port in theswitch and associates it with the logical interface (LIF) correspondingto the base fabric domain b. LFM also sends the list of interfacescorresponding to the XISLs to reach base fabric domain b and creates anassociation between the LIFs and the regular interfaces.

In FIG. 14, LIF 1410 a corresponds to a first base fabric domain and LIF1410 b corresponds to a second base fabric. The LFM creates an LISL fromlogical switch 1420 a to a logical switch 1420 b on the first basefabric domain. To do this, the LFM creates a logical port on logicalswitch 1420 a corresponding to this LISL and calls a function of the LIF1410 a that forms an association between the logical port and the LIF1410 a. Similarly, when the LFM creates an LISL between logical switch1420 b and logical switch 1420 a, an association is formed between thecorresponding logical port of logical switch 1420 b and the LIF 1410 a.

The LIF to IF mapping in FIG. 14 defines the XISLs that need to be usedfor tunneling the frames that traverse the LISLs. IFs 1430 connect tothe physical ports 1440 associated with each IF 1430.

LIF 1410 c in FIG. 14 corresponds to a logical F-port. For every F-portcreated in the chassis, an LIF is created, as well as an associationbetween all logical F ports associated with that particular regular Fport and the LIF 1410 c.

This LIF 1410 c is in turn associated with the interface 1430corresponding to the regular F-port, in FIG. 14, IF 1430 d. The arrowsbetween the logical switch 1420 c and IF 1430 d define this association.

FIG. 15 illustrates one embodiment of a frame flow when routing framesin a virtual fabric across an XISL 1580. When there are no XISLs,routing works just like in conventional L2 switches by properlyconfiguring the routing tables in the physical ports. Each logicalswitch has its own view of the fabric, routes, etc. Each logical switchis managed as if it were a conventional L2 switch. When XISLs are partof a virtual fabric, then L3 routing is used to route frames across thevirtual fabric. The base logical switch for a source chassis identifiesthe chassis and base logical switch yielding the shortest path to adestination switch in the virtual fabric. That shortest path is thenused in the base fabric to get to the destination chassis. When usingmultiple chassis, multipathing can be available in some embodiments.Some embodiments can switch from L3 routing to port-based routing whenrunning out of hardware resources.

In some embodiments, each virtual fabric uses a different virtualchannel (VC) over the XISL 1580. In one embodiment, up to four virtualchannels can be used. If available, Quality of Service (QoS) VCs aremaintained across the base fabric.

When routing frames for a virtual fabric over XISLs, and both DISLs andLISLs are available paths, the routing algorithm in one embodiment givespreference to DISLs over LISL routing over XISLs if the paths throughthe DISLs and XISLs have an equal number of hops. In other embodiments,other preferences can be configured.

As shown in FIG. 15, a frame 1500 is sent over DISL 1510 to a physicalport (P20) associated with logical switch 1520, with destinationinformation indicating that the frame should be delivered over DISL 1570associated with logical switch 1550. The frame is encapsulated by thelogical switch for routing over the XISL 1580 at physical port P100, asshown by encapsulated frame 1560 that is formed according to the FibreChannel Inter-Fabric Routing (FC-IFR) working draft standard. When theframe is received by the base logical switch 1540 at physical port P200after traversing the XISL 1580, the frame is decapsulated back to frame1500, which can then be delivered by logical switch 1550 over DISL 1570.

FIGS. 16-18 illustrate three different views of software layering forvirtual fabrics according to one embodiment. FIG. 16 is a high-levelstack view, FIG. 17 is a control path layering view, and FIG. 18 is adata path layering view.

FIG. 16 is a block diagram that illustrates high-level softwarearchitecture stack for partitioning a chassis into virtual fabricsaccording to one embodiment. Logical fabric manager (LFM) 1610 isresponsible for creating and maintaining a virtual fabric topology foreach virtual fabric of the chassis. The Partition Manager 1620 providesconfiguration about each partition in the chassis to the LFM 1610,sending the LFM 1610 information about the partitions configured in thelocal chassis, and for each partition, configuration information, suchas the fabric id associated with the partition and other relevantinformation. LFM 1610 also interacts with the base fabric's fabricshortest path first (FSPF) module 1830 (best illustrated in FIG. 18) toknow about domains in the base fabric and the base fabric topology.Configuration information is then exchanged with all other LF capablechassis in the base fabric. All partitions configured with the samefabric id would belong to the same virtual fabric. Based on theconfiguration information and base fabric topology, the LFM 1610 createsLISLs for each virtual fabric to establish the control path topology. Inaddition, the LFM 1610 coordinates with the virtual fabric FSPF module1830 to establish a full mesh of phantom ISLs (PISLs) between logicalswitches for data paths.

The functionality of the LFM 1610 can be considered as follows: (a)maintaining a database of configuration information about each chassisin the base fabric such as the partition id and the fabric id for allthe partitions in the chassis, chassis' switch capabilities etc.; (b)creating and deleting LISLs for control paths with a selected set oflogical switches as determined by a logical adjacency determinationalgorithm; and (c) co-coordinating with the FSPF 1830 to facilitate afull mesh of destination router switch (DRS)-based connectivity PISLsfor data paths between all logical partitions.

In addition to the Logical Fabric Manager 1610 and the Partition Manager1620, which have a single instance for an entire chassis, the softwarearchitecture illustrated in FIG. 16 includes modules that have separateinstances for each logical switch into which the chassis is partitioned.The per-switch instances includes a Fibre Channel Protocol andManageability Services module 1630, which generally controls thesoftware resources related to the Fibre Channel hardware for thepartition created by the Partition Manager 1620. A Fibre ChannelProtocol module 1640 provides services to the FC Protocol andManageability Services module 1630 that relate to the Fibre Channelprotocol. A Switch Abstraction module 1650 provides services related toabstraction of the logical switch from the physical hardware, using aLogical Port Manager (LPM) 1660 to manage the ports that are associatedwith the logical switch. The LPM 1660 also interacts with the LogicalFabric Manager 1610. The Logical Port Manager 1660 uses the services ofa Fibre Channel Tunneling Protocol module 1670 for handling FC over FCencapsulation and decapsulation of traffic across LISLs defined for thelogical switch. The Logical Port Manager 1660 uses the hardware drivers1680 to manage the physical ports that are associated with the logicalswitch.

Irrespective of the topology of the virtual fabric, a logical switchaccording to various embodiments can send a data frame directly toanother logical switch in the virtual fabric by encapsulating the framein an inter-fabric routing (IFR) header and sending it over the basefabric to the destination switch's base fabric domain (DRS), as shownabove in FIG. 15. To take advantage of this fact, the data path for thevirtual fabric is preferably a full mesh between logical switches. Inaddition, the FSPF 1830 in L2 switches in the edge fabric might misroutethe frames through less optimal paths unless their logical switchdatabase (LSDB) reflects a full mesh of links between logical switches.Unless the logical topology that LFM 1610 creates for control paths isfull mesh, in some embodiments the LISLs cannot be used for a data path,because the underlying hardware cannot route frames from one LISL toanother LISL, which amounts to removing the IFR header and adding a newIFR header to the same frame.

In some embodiments, the data path is decoupled from the control path.Irrespective of the set of LISLs that the LFM 1610 creates for a virtualfabric, the LSDB of the virtual fabric's FSPF 1830 preferably reflects afull mesh of PISLs between all the logical switches. The cost of theselinks is the cost of the corresponding paths in the base fabric. Forexample, if domain A is connected to domain B in the base fabric by twohops, the link cost of each being 0.5, the data path link cost of A andB in the virtual fabric is 1.0.

The base fabric FSPF 1830 provides a distance metric for each basefabric domain from the local chassis. Calculating and programming routeswith the PISLs directly results in ASICs being programmed to encapsulatethe frame at a logical switch port using the DRS and also to route at abase fabric port based on the FID/DID of the encapsulated frame.

The LPM 1660, as the name suggests, manages all logical ports in thechassis for a logical switch. The LFM 1610 creates a logical port withLPM 1660 for each LISL that it creates. Other embodiments can have otherkinds of logical ports and the LPM 1660 in some embodiments is designedto handle different logical ports supporting different protocols (e.g.,FC over FC, FC over IP, VSAN F ports etc).

In a different view, FIG. 17 illustrates a control path layeringaccording to one embodiment. The LFM 1610 interacts with the PartitionManager 1620 and the LPM 1660 for each logical switch. The PartitionManager 1620 interacts with instances of the modules for each logicalswitch through a Switch Driver 1710. The Switch Driver 1710 controls thelogical switch through interaction with a Fibre Channel driver 1750. TheFibre Channel driver 1750 uses the services of a fabric server 1720, aname server 1730, and a zone server 1740 that provide services relatedto Fibre Channel fabrics, names, and zones to the entire chassis. TheLPM 1660 instance for the logical switch gains access to those servicesthrough the FC Driver 1750. The LPM 1660 interacts with a Fibre ChannelTunnel Driver 1760 to control LISLs tunneling across XISLs, and the ASICdriver 1770 to control traffic across physical ports assigned to thelogical switch, such as for a DISL. The ASIC Driver 1770 in turn drivesthe ASIC hardware 1780 to control the physical ports assigned to thelogical switch.

The LPM 1660 maintains (a) a list of logical ports in the system andcorresponding LIF objects, (b) attributes and properties associated witheach logical port and (c) association between the logical ports andother physical ports. Association is many-to-many and in one embodimentis strictly hierarchical (a parent-child relationship). For example, inone embodiment, a VSAN F port is a parent of the underlying blade portin the chassis. Any frame that needs to be transmitted on the VSAN Fport needs to be transmitted on the underlying blade port. A FC over FC(tunnel) logical port that is created for a LISL is a parent containingas its children the set of physical ports that belong to the baselogical switch and that can be used to reach the DRS that is at theother end of the LISL. A frame that arrives on a particular port can bemeant for any of its parents.

The LPM 1660 maintains callbacks for different protocols and providesde-multiplexing of control flow for frame transmit and receive based onthe specified protocol.

The FC tunnel driver 1760 understands the FC over FC tunneling protocoland performs the encapsulation and de-encapsulation necessary for avirtual fabric frame to be tunneled in the base fabric. The tunneldriver 1760 registers with LPM 1660 handler functions for transmit andreceive requests for the FC over FC protocol.

When a frame needs to be sent on a logical port or a LIF, an ops handlerfor the LIF object calls the tunnel driver 1760 (via the LPM 1660infrastructure) to add the IFR header, providing it with the DRS, SourceFabric Identifier (SFID) and Destination Fabric Identifier (DFID) touse. The tunnel driver 1760 returns the encapsulated frame, which isthen transmitted on one of the physical ports that is a child of the LIFobject.

When a tunneled frame is received on a logical port, the ASIC driver1770 calls the tunnel driver 1760 (via the LPM 1660 infrastructure) tohandle the frame. The tunnel driver 1760 infers the fabric id and basefabric source domain from the frame header. The tunnel driver 1760 isthen able to identify uniquely one of the parent logical ports as arecipient of the frame, based on the header information. The tunneldriver 1760 then delivers the frame to the logical port by calling thereceive function for the LIF object.

The ASIC driver 1770 manages the ASIC resources of the chassis and isresponsible for populating the appropriate registers and tables of thehardware 1780 based on various kinds of information provided by otherkernel modules, particularly the fabric operating system routing module.

For example, based on the information provided, the ASIC driver 1770programs the routing information in the routing tables used by thehardware 1780 for the physical ports.

The switch driver 1710 supports logical ports, including supporting thecreation and deletion of logical ports, IOCtls on logical ports, andinteracting with the LPM 1660 and LIF objects as needed. Ops functionfor the LIF objects are provided by the switch driver 1710.

In some embodiments, the switch driver 1710 also sends domain controllerframes through PISLs (even though there is no port associated with aPISL) and other supporting other virtual fabric and routing relatedIOCtls.

Although the previous description is written in the context of software,as is described below, the encapsulation of frames described above maybe performed by hardware in the ASIC instead of the various driversdescribed above. In one embodiment, firmware drivers may perform theseactions for control path frames originating in the CPU of the switch,while the hardware performs these actions for datapath frames,performing encapsulation and decapsulation in addition to L3 routing.

A third view, related to data path layering, is presented by FIG. 18.Data from the LFM 1610 flows through the base fabric management portion1810 of the LFM 1610 to the LPM 1660. The LPM 1660 provides portinformation to a RouTing Engine (RTE) 1820, which also receives datafrom FSPF module 1830, either from physical neighbor management based ondedicated E ports (1840) or logical neighbor management based on thebase fabric configuration (1850).

The RTE 1820 performs routing table development. In addition, the RTE1820 uses reachability information received from the FSPF 1830 to createan edge structure or calculate a path. The RTE 1820 passes DRSinterfaces to the ASIC driver 1770, which then uses the DRS associatedwith the interface.

The RTE 1820 in some embodiments includes DRS interfaces andhierarchical routing. In such embodiments, the RTE 1820 treats a DRSlike a regular FC interface in terms of accepting reachabilityinformation from the FSPF 1830, creating an edge structure orcalculating a path. When programming the routes, the RTE 1820 passes theDRS interface to the ASIC driver 1770, which uses the DRS associatedwith the interface to interact with the hardware 1780.

Hierarchical routing is a feature of one embodiment of the RTE 1820 thatmaintains sources and destinations at different levels of the hierarchy,such as ports in a chassis, logical sources and destinations connectedto the chassis, and LF destinations connected to the fabric, andunderstands the relationship between entities at different levels. Forexample, a DRS interface corresponds to a domain in a base fabric and anegress interface corresponds to a port in a chassis. This results inimproved load balancing considering the load at the lowest level ofhierarchy while calculating routes at a higher level.

The ASIC driver 1770 uses the data received from the RTE 1820 and otherinformation about the physical and logical ports associated with thelogical switch received from the LPM 1660 to program the hardware 1780.

In one embodiment, as illustrated in FIG. 19, the LFM 1610 runs as asingle threaded user level daemon servicing events one at a time from acommon message queue 1900. The LFM 1610 includes a controller block 1910that is responsible for handling all incoming events, such as (1) a DBexchange received event (1902); (2) a LISL create/delete request from aLFM in another chassis (1904); (3) a state change notice (SCN)indicating a change in the base fabric, such as when a domain becomesreachable or unreachable (1906); and (4) IOCtls to handle command lineinterfaces (CLIs) (1908). The controller block 1910 services each eventby coordinating with other blocks within the LFM 1610.

The LFM 1610 also includes a physical fabric database manager 1920,which in some embodiments maintains a physical fabric database 1922containing information about the base fabric topology and informationabout each chassis in the multi-chassis base fabric such asconfiguration, capabilities used, etc. Also maintained is a distancemetric for each chassis in the base fabric, based on base fabrictopology, to be used by a logical adjacency determination algorithm.

The LFM 1610 also includes a logical topology manager 1930, thatdetermines the list of logical ISLs that the logical switch shouldestablish with other logical switches in the fabric using a logicaladjacency determination algorithm and is responsible for maintaining thelist of LISLs that have been created for each virtual fabric.

The LFM 1610 also includes a logical link manager 1940, which isresponsible for establishing and tearing down logical ISLs with otherswitches using a LISL creation/deletion protocol and interacting withthe logical port manager 1660 to create/delete corresponding logicalports. In one embodiment, the logical link manager 1940 employs a LISLdatabase 1942 for this purpose.

In one embodiment, the physical fabric DB manager 1920, the logicaltopology manager 1930, and the logical link manager 1940 use a messagingservice 1950 to handle messaging between them, the partition manager1620, and a reliable transmit with response (RTWR) module 1980.

In some embodiments, the LFM 1610 can be implemented as a state machine,as will be understood by one of ordinary skill in the art. Whenever astate machine is uninstantiated because of a remote partition beingremoved or remote base fabric domain becoming unreachable, thecorresponding logical port is uninstantiated and any pending requestmessages for the LISL with RTWR are cleaned. The logical ports that arecreated when an LISL is established are registered with the switchdriver 1710 as part of the logical switch. A logical port goes into anonline state when the creation protocol completes and becomes availablefor regular control path activities.

The following is a sequence of events that happen when a new partitionis created in the local chassis according to one embodiment. Thepartition manager 1620 begins by creating a new partition with thekernel and allocating appropriate resources. After the partition iscreated, the partition manager 1620 notifies the LFM 1610 of the newpartition along with attributes of the new partition, in particular apartition ID and fabric ID. The LFM 1610 updates its physical fabricdatabase 1922 with the new partition's information and sends anasynchronous configuration update to all virtual fabric-capable chassisin the base fabric. The LFM 1610 runs a logical adjacency determinationalgorithm for the new partition to identify the set of LISLs to create.The LFM 1610 creates a logical port in the new partition for each LISLthat is created, by sending a logical port creation request to the LPM1660 and specifies the set of physical ports in the base fabric that canbe used to reach the peer domain as its children. The LPM 1660 allocatesan LIF object in the logical port hierarchy and adds the specifiedchildren. It also registers the logical interface id with an interfacemanager as a DRS type interface. The LPM 1660 allocates a user portnumber and port index within the partition for the logical port. It thensends a port creation request to the switch driver 1710 for thepartition, specifying the logical interface ID.

The LFM 1610 runs a LISL creation protocol with the peer switch. In oneembodiment, the LISL creation protocol is a two-way handshake betweenthe requesting LFM 1610 and responding LFMs. The requesting LFM 1610sends a message containing the FC worldwide name (WWN) of the requestingchassis, a base domain identifier, a virtual fabric ID, the WWN of thelogical switch connected to the LISL, and the user port number in thelogical switch. Responding LFMs send a message to the requesting LFM1610 containing the responding chassis's WWN, a base fabric domainidentifier, and a user port number. The disclosed protocol isillustrative only and other creation protocols and message contents canbe used. The LFM 1610 maintains a database 1942 of LISLs that it hasestablished with other logical switches (as either a requestor orresponder) and stores attributes of the LISL in this database 1942. Insome embodiments, the LISL database 1942 also contains information forLISLs that are in the process of being established.

In one embodiment, a state machine, illustrated in FIG. 30, is used foreach LISL that the LFM 1610 attempts to establish and implements a LISLprotocol with peer switches for creating and deleting LISLs. The stateis maintained as part of the database of LISLs 1942. The technique usedcan attempt to establish an LISL with a logical switch that is not yetonline. In that event, the state machine and the logical port exist, butthe state of the LISL is set to indicate that the peer partition isdown. When the peer LFM notifies the LFM 1610 that the peer logicalswitch is up, then the LFM 1610 can set the LISL as online.

As illustrated in FIG. 30, from a start state 3005, if a LISL creationrequest is received, the state machine transits to state 3030,indicating a request has been received. If the request is accepted, thestate machine transits to state 3040, where the logical port is created,then to state 3065, indicating the logical port is online. If therequest is rejected, then the rejection response is sent, and the statemachine moves to state 3070, indicating the LISL creation request hasbeen rejected.

If at start state 3005 a LISL creation is to be initiated, then thestate machine transits to state 3010, where the logical port is created,then a creation request is sent to the peer LFM, moving the statemachine to state 3030, which indicates the creation request has beensent. If the LFM 1610 receives a message indicating the request shouldbe retried, then the state machine transits to state 3015, at whichpoint the request is resent, transiting the state machine back to state3030. If, as discussed above, the LFM 1610 receives a responseindicating the peer partition is down, then the state machine moves tostate 3035, where it waits until receiving a peer switch onlinenotification, at which point the state machine moves back to state 3030,to retry the request.

If the response received indicates that the peer LFM in creating theother end of the LISL, then the state machine moves from state 3030 tostate 3025, indicating that a peer request is in progress. Uponreceiving a creation request from the peer LFM, the state machine movesto state 3045, indicating the LFM 1610 has received the request, andfrom then to state 3075 as the LFM sets the logical port online. If,while in state 3030 the LFM 1610 receives a LISL creation request, thestate machine transits to state 3045, proceeding further as describedabove.

If the LFM 1610 receives an accepted response to the LISL creationrequest, then the state machine moves to state 3050, indicating therequest has been accepted, and then the LFM 1610 sets the logical portonline and moves to state 3075.

If the LFM 1610 gets an error response indicating that the LISL creationrequest timed out, then the state machine moves to state 3060,indicating the creation request failed. The LFM then retries the requestwith an increased timeout, moving back to state 3030.

If while in state 3075 the LFM 1610 determines the peer partition isdown, then the LFM 1610 sets the logical port down or offline and movesto state 3035, to wait until the peer switch comes online.

At the completion of the protocol, the LFM 1610 moves the logical portto online state. The switch driver 1710 sends a port up state changenotice (SCN) to all relevant protocol daemons. The FSPF 1830 ignoresthis event because the port corresponds to an LISL. Other daemons act onthe port like a regular port. Thus, a virtual fabric is formed.

The LFM 1610 waits for a fabric formed event for the virtual fabric, andupon receipt of the event notification, sends a data path up SCN to thevirtual fabric FSPF 1830 (for the new partition) for each base fabricdomain that has a partition in the same fabric. The SCN contains the LIFthat was allocated for the base fabric domain. The cost of the link isspecified in the SCN and is set to the cost of the corresponding path inthe base fabric, as discussed above.

The virtual fabric FSPF 1830 floods logical switch database (LSDB) 1942updates on all dedicated ISLs. The new link in the FSPF 1830 triggers apath calculation. The FSPF 1830 sends path information messages for allaffected paths to the RTE 1820 (via the switch driver 1710 for thepartition). The paths may contain LIFs in the reachable interfaces list.

The RTE 1820 creates a DRS edge for each LIF specified in thereachability list for a destination. The RTE 1820 sends add route IOCtlsto the ASIC driver 1770 for each new route that is created or affected.

The ASIC driver 1770, when receiving an add route IOCtl, acts based onthe destination fabric id, base fabric id and egress interface type. Ifthe destination fabric id is the same as the base fabric id, the egresstype is expected to be logical (containing logical interfaces) and theL2 tables are programmed accordingly. If the destination fabric id isdifferent, the L3 routing entry is programmed. If the egress interfacetype is DRS, the DRS attribute of the interfaces will be used in theencapsulation header. If egress interface is logical, the egressinterface is specified in the L3 routing entry directly. In either case,a flag in the IOCtl specifies whether an encapsulation header should beadded. When an egress interface id is specified, the DRS of the localchassis is used.

The virtual fabric FSPF 1830 updates the switch driver 1710 with the newset of interfaces available to send domain controller frames for eachaffected remote virtual fabric domain.

The switch driver 1710 updates the domain-port table. If the specifiedinterface is a data path LIF, the switch driver 1710 directly stores theLIF instead of the port number in the table and sets a flag saying theentry corresponds to an LIF directly.

When a partition is to be deleted, the partition manager 1620 makes asynchronous call to the LFM 1610 to notify the LFM 1610 of the partitionto be deleted. The LFM 1610 removes the partition from its physicalfabric database 1922 and sends an asynchronous configuration update toall virtual fabric-capable chassis in the base fabric. The LFM 1610sends a data path down SCN for each data path link to the virtual fabricFSPF 1830 for the partition. The virtual fabric FSPF 1830 removes thecorresponding LSRs, and sends LSDB 1942 updates on all dedicated ISLs.The virtual fabric FSPF 1830 will also update the switch driver 1710with the new set of interfaces available to send domain controllerframes for each remote virtual fabric domain (the new set might beempty).

The LFM 1610 runs a LISL teardown protocol for each LISL in thepartition and deletes the logical ports associated with the LISL bysending a port delete request to the LPM 1660. The LPM 1660 sends asynchronous port deletion request to the switch driver 1710. The switchdriver 1710 sends a port down SCN for the logical port to all relevantprotocol daemons. The FSPF 1830 ignores this event because the portcorresponds to an LISL. The switch driver 1710 finally removes thelogical port.

The LPM 1660 destroys the LIF object, unregisters the LID id with theinterface database, and notifies the LFM 1610 when the cleanup iscomplete. The LFM 1610 returns to the partition manager 1620. Thepartition manager 1620 deletes the partition with the kernel, completingthe deletion process.

FIGS. 20-21 illustrate frame transmit and receive paths and controlpaths according to one embodiment, as frames pass through a systemconfigured for virtual fabrics. Frames are passed in Information Units(IUs) data structures that are received or sent by the FC protocol layer1640. FIG. 20 illustrates a frame transmit path when a frame is receivedon a logical port and the way the LPM 1660 gets involved in the flowpath. A frame received on a regular port takes a conventional paththrough the switch, but a frame received on a logical port is tunneledthrough XISLs in the base logical switch. When daemons need to send aframe, they send a command to the FC Driver 1750. The frame is receivedby the FC driver 1750 in block 2010, and is then sent through the Switchdriver 1710 in block 2020.

The Switch driver 1710 then determines in block 2030 whether the framewas received on a logical port. If so, then the switch driver 1710invokes the LPM 1660 in block 2040, indicating the frame was received asa FC frame. The LPM 1660 invokes an LIF OPS element in block 2050, whichfinds a physical port of the LIF on which to transmit the frame, passingthat information back to the LPM 1660. The LPM 1660 invokes the tunneldriver 1760 in block 2060 to encapsulate the frame. The tunnel driver1760 invokes a blade driver in block 2070 to transmit the frame, whichpasses the frame to the ASIC driver 1770 in block 2080 for transmission.If the frame was received on a regular port, then the frame is passed tothe blade driver from the switch driver 1710 without invoking the LPM1660.

Although the previous description is written in the context of software,as is described below, the encapsulation of frames described above maybe performed by hardware in the ASIC instead of the various driversdescribed above. In one embodiment, firmware drivers may perform theseactions for control path frames originating in the CPU of the switch,while the hardware performs these actions for datapath frames,performing encapsulation and decapsulation in addition to L3 routing.

FIG. 21 illustrates a frame receive path in a similar embodiment. When aframe is received by the ASIC driver 1770 in block 2110, the ASIC driver1770 checks the R CTL field of the IU header in block 2120. If the R CTLfield indicates an FC over FC frame, indicating a tunneled frame, theframe is sent to the tunnel driver 1760 in block 2130 by invoking theLPM 1660 in block 2140. The LPM 1660 then invokes the LIP OPS element inblock 2150 to invoke the switch driver 1710 in block 2160. At thatpoint, the frame is processed as if it were received on a physical port,passing the frame through the Fibre Channel driver 1750 in block 2180.If frame is not FC over FC, indicating it does not involve a logicalport, the ASIC driver 1770 invokes the blade driver in block 2170, whichthen invokes the switch driver 1710 in block 2160 for normal processingof the frame through the Fibre Channel driver in block 2180 as inconventional switches.

Turning to FIG. 22, a flowchart illustrates one embodiment of atechnique for encapsulating a frame that passes across a LISL. In block2210, the size of the frame is validity checked. If valid, then in block2220, the FID is determined by querying the LIF, and validity checked inblock 2225. If valid, the transmit domain is determined by querying theLIF, and validity checked in block 2235. If any of the validity checksof blocks 2210, 2225, or 2235 fail, an error is returned in block 2215.

Then in block 2240, the source domain is obtained by querying the LIF. Anew frame is allocated in block 2245 big enough to hold the originalframe and the encapsulation header and the IFR header. In block 2250 theencapsulation header is created in the new frame, then in block 2260,the header and data of the original frame are copied to the new frame.The memory holding the original header is then freed in block 2265 and apointer to the new frame is set in block 2270. In block 2275, a logicalinterface is obtained for the exit port, and the process is completed byinvoking the LPM 1660 to transmit the encapsulated frame over thelogical interface.

FIG. 23 illustrates an embodiment of a similar technique fordecapsulating a frame passing across an LISL. In block 2310, the RCTLfield of the frame is obtained, and then in block 2315 it is validitychecked by checking to see if the header is an encapsulated header. Ifnot, then in block 2320 an error is indicated. If the header is valid,then in block 2325 the FID is obtained from the IFR header. In block2330, a new frame is allocated big enough to hold the decapsulatedframe. The header is copied to the new frame in block 2335, and thepayload data in block 2340. The memory holding the encapsulated frame isfreed in block 2345 and a pointer is set to the new decapsulated framein block 2350. In block 2355, the LPM 1660 is invoked to determine theinterface id associated with the new frame, and then in block 2360, theLPM 1660 is invoked to determine the port to be associated with the newframe, based on the interface id and the FID. The port value is storedin the decapsulated frame in block 2365 and the LPM 1660 is then invokedin block 2370 to handle the decapsulated frame.

As discussed previously, the protocol for transmitting virtual fabricframes over the base fabric involves adding two extended headers to eachdata frame that is transmitted over a LISL: an inter fabric routingextended header (IFR header), and the encapsulation extended header (Encheader). The inter-fabric routing extended header (IFR header) providesthe necessary information to support fabric-to-fabric routing. Theinformation includes: (a) the fabric identifier of the destinationfabric (DF ID); (b) the fabric identifier of the source fabric (SF ID);and (c) information appropriate to determine the expiration time or hopcount. The encapsulation extended header is used to transmit framesbetween Inter-fabric Routers in a multi-chassis virtual fabric.

In the case where a data frame is transmitted to a logical switch over alogical ISL to a logical switch that is connected through the basefabric, each data frame first has the IFR header appended to the frontof the frame, and then the Enc header appended to front of the IFRheader. The encapsulation header allows the frame to be sent as a normalframe through the base fabric, and allows normal L2 switches to exist inthe base fabric. The destination address in the encapsulation header isset to DRS address of the closest base logical switch to allow forfurther routing if the destination logical switch is not connectedthrough the base fabric. The IFR header allows the frame to be matchedto a particular logical switch once received by the destination switch.A simple example of a possible LF topology and a high-level abstractionof the data path and the frame can be seen in the FIGS. 24 and 25.

In the example topology of FIG. 24, a logical switch 2410 has an FID ofX. The logical switch is connected to the rest of virtual fabric X(logical switches 2460 and 2430) over LISLs connected via XISLs betweenbase fabric switches 2420, 2450, and 2440, which are not part of thelogical fabric X.

Turning to FIG. 25, a frame 2510 from logical switch 2410 has a FCheader 2512 indicating that the source is logical switch 2410, and thedestination is logical switch 2430, and a payload 2514. The tunneldriver 1760 then encapsulates the original frame 2510, producingencapsulated frame 2520 by adding the IFR header 2524 to indicate thevirtual fabric associated with this frame (FID X in FIG. 25), and thenadding Enc header 2522, indicating that the source switch is baselogical switch 2420, and destination switch is base logical switch 2440.Encapsulated frame 2520 is routed through the base fabric, in thisexample through base logical switch 2450. Upon receipt by base fabricswitch 2440, the tunnel driver 1760 is invoked to decapsulate frame2520, producing the original frame 2510 again, which is then passed tological switch 2430.

In the situation where a logical destination switch does not have a basefabric partition on the local chassis, then the frame is sent to theclosest base logical switch partition's DRS address and the hardwarestrips off the Enc header and IFR header and forwards the frame to thedestination. FIGS. 26 and 27 illustrate a topology for this scenario anda high-level abstraction of the frame data flow.

In the example topology of FIG. 26, logical switch 2610 is part ofvirtual fabric X and is connected to the rest of the virtual fabric X byan LISL tunneled through an XISL connected between base fabric switches2620 and 2630. Logical switch 2650, also part of virtual fabric X, isconnected via a DISL to base logical switch 2630, and can be a legacy L2switch that does not support virtual fabrics. As shown by FIG. 27, as inFIG. 25, the original frame 2710 is sent from logical switch 2610 withan FC header 2712, indicating the source is logical switch 2610 and thedestination is logical switch 2650, and a payload 2714. As in FIG. 25,the tunnel driver 1760 adds an IFR header 2724 indicating virtual fabricX, and an Enc header 2722 indicating the source is base logical switch2620 and the destination is base logical switch 2630, producingencapsulated frame 2720. The encapsulated frame 2720 is then routedthrough the base fabric to base logical switch 2630, which decapsulatesthe frame, and sends original frame 2710 on to logical switch 2650.

In some embodiments, the virtual fabric design allows for mapping VSANports to logical F Ports, which are assigned to a logical switch, soeach logical F Port on a port can be mapped to a particular logicalswitch. The tunnel driver 1760 adds or removes a VFT header, as definedin the Fibre Channel Switch Fabric-4 (FC-SW-4) work draft standard, fromVSAN capable ports and maps the frame to a logical F Port. Basic frameflow examples can be seen in the FIGS. 28 and 29.

FIG. 28 illustrates an example frame flow for an outbound logical Fport. A frame 2810 is sent from logical switch AA with FC header 2812indicating a source of logical switch AA and a destination of N_PID. Theframe 2810 is sent through the tunnel driver to add a VFT header 2822indicating VF_ID of x, producing frame 2820. The frame 2820 is then sentover the logical F port for delivery. FIG. 29 illustrates an exampleframe flow for an inbound logical F port, which reverses the procedureof FIG. 28. Upon receipt of the encapsulated frame 2910, which includesa VFT header 2912 indicating VF_ID x, an FC header 2922, indicating thesource is N_PID and the destination is logical switch AA, and a payload2924. The tunnel driver 1760 decapsulates the frame 2910, producingframe 2920, with only the FC header 2922 and payload 2924, which canthen be sent through the base fabric to the logical switch AA fordelivery.

Turning to FIGS. 31-33, we now put all of the pieces together to show anexample of how a network 3100 of three physical switches 3110, 3120, and3130 can be partitioned and connected into multiple virtual fabrics.FIG. 31 illustrates the physical switches 3110, 3120, and 3130, and enddevices 3140-3195. Ports 3, 5, 6, and 12 are illustrated as defined inswitch chassis 3110. Host 3140 (H1) is connected to port 3, host 3150(H2) is connected to port 5, and storage system 3170 (ST1) is connectedto port 12. Switch chassis 3120 is shown with ports 3-4 and 6-8, withhost 3160 (H3) connected to port 4 and storage system 3180 (ST2)connected to port 6. Finally, switch 3130 is shown with ports 0, 1, 7,and 9, with storage system 3190 (ST3) connected to port 7 and storagesystem 3195 (ST4) connected to port 9. Other ports can be defined inswitch chassis 3110-3130, but are omitted for clarity of the drawing.Although three switch chassis are shown in FIGS. 31-33, any desirablenumber of switch chassis can be connected into a switch networkaccording to the disclosed embodiments.

FIG. 32 illustrates partitioning the three physical switches 3110-3130,and assigning ports to the logical switches. Switch chassis 3110 ispartitioned into logical switches 3210 (LSW1), 3220 (LSW2), and 3230(BF1). Logical switch 3230 is designated as a base switch in thepartition configuration. Switch chassis 3120 is partitioned into logicalswitches 3240 (LSW3), 3250 (LSW4), and 3260 (BF2), with logical switch3260 designated as a base switch. Switch chassis 3130 is partitionedinto logical switches 3270 (LSW5), 3280 (LSW6), and 3290 (BF3), withlogical switch 3290 designated as a base switch. The number of logicalswitches shown in FIG. 32 are by way of example and illustrative only,and although each of the switch chassis 3110-3130 are shown aspartitioned into three logical switches, as disclosed above, any desirednumber of logical switches can be defined in a switch chassis. Nosignificance should be given to the arrangement of the logical switchassignments in FIG. 32. For example, any of the logical switches of apartitioned switch chassis can be designated as a base switch.

In addition to the logical switch partitioning, FIG. 32 illustrates anexample of assigning ports to the logical switches, with the port numberassignments in the logical switches shown in dashed lines whenassociated with a port of the physical switch chassis, and in dottedlines when a logical port. Logical switch 3210 is assigned port 3 of theswitch chassis 3110, with port 3 assigned as port 1 of the logicalswitch 3210. Similarly, port 5 of the switch chassis 3110 is assigned tological switch 3220 as port 3, port 12 is assigned to logical switch3220 as port 2, and port 6 is assigned to base switch 3230 as port 1. Aswith the port number assignments of the switch chassis, the port numberassignments of the logical switches are illustrative and by way ofexample only. In addition, logical port 2 is defined in logical switch3210 and logical port 1 is defined in logical switch 3220. As shown inFIG. 32 and described above, the logical ports are not associated with aphysical port of the switch chassis 3110.

Similarly, in switch chassis 3120, physical ports 4 and 6 are assignedto ports 1 and 3, respectively, of logical switch 3240. Two logicalports 2 and 4 are also defined in logical switch 3240. Physical port 7of the switch chassis is assigned as port 1 of logical switch 3250.Physical ports 5 and 8 are assigned as ports 1 and 2, respectively, inbase switch 3260. Likewise, in switch chassis 3130, ports 0 and 7 areassigned to logical switch 3270 as ports 1 and 3, respectively, port 9of the chassis 3130 is assigned to logical switch 3280 as port 1, andport 1 of the chassis 3130 is assigned to base switch 3290 as port 1. Ineach of logical switches 3270 and 3280, a logical port 2 is assigned tothe logical switch, with no associated physical port.

As illustrated in the example partitioning and assignments of FIG. 32,port numbers assigned to logical switches do not necessarily have thesame port number assignment as the port number for the switch chassis,and different logical switches may have ports defined with the same portnumber as other logical switches in the same chassis. Where end devicesare attached to physical ports, those logical switches will processtraffic to and from the end device using the port number assignment ofthe logical port.

Turning to FIG. 33, we see an assignment of the logical switches tovirtual fabrics, and inter-switch links connecting the various logicalswitches. Logical switches 3210, 3240, and 3270 are all assigned tovirtual fabric 3310 (VF1), logical switches 3220 and 3280 are assignedto virtual fabric 3320 (VF2), logical switch 3250 is assigned to virtualfabric 3330 (VF3), and base switches 3230, 3260, and 3290 are assignedto base fabric 3340 (BF). Note that no logical switches are assigned tovirtual fabric 3330 (VF3) in switch chassis 3110 and 3130.

In addition to the virtual fabrics, FIG. 33 illustrates an example ofvarious inter-switch links, with XISLs shown as dashed lines, and LISLsshown as dotted lines. XISLs 3360 and 3265 connect base switches 3230and 3260, with XISL 3360 connecting port 1 of base switch 3230 to port 1of base switch 3260, and XISL 3365 connecting port 2 of base switch 3260to port 1 of base switch 3290. These XISLs are used for transportingdata for the LISLs 3370, 3375, and 3377 defined in the network 3100.LISL 3370 connects logical port 2 of logical switch 3210 to logical port2 of logical switch 3240, LISL 3375 connects logical port 4 of logicalswitch 3240 to logical port 2 of logical switch 3270, and LISL 3377connects logical port 1 of logical switch 3220 to logical port 2 oflogical switch 3320. Additional LISLs can be defined to complete a fullmesh of the virtual fabric 3310 if desired. As explained in more detailabove, LISLs 3370, 3375, and 3377 use services provided by the baseswitches 3230, 3260, and 3290 to tunnel data across XISLs 3360 and 3365.Note that although logical switch 3250 is part of the same physicalchassis 3120, because it is assigned to virtual fabric 3330 (VF3), andis not part of virtual fabric 3310 or 3320, none of the traffic passingthrough LISLs 3370, 3375, and 3377 is seen by logical switch 3250.Likewise, none of the traffic for virtual fabric 3310 is seen by thelogical switches of virtual fabric 3320 and vice versa. No DISLs aredefined in the illustration of FIG. 33.

Thus, for example, if host 3150 needs data from storage system 3195,that data will traverse the connection from storage system 3195 to port1 of logical switch 3280, then go via 3377 from port 2 of logical switch3280 to port 1 of logical switch 3220 LISL and finally via port 3 tohost 3150. As described in detail above in the discussion of FIGS. 24and 25, the traffic between storage system 3195 and host 3150 for LISL3377 is routed across XISLs 3360 and 3365 using the additional headersadded to the frames. The logical switch 3220 puts FC headers on framesthat specify the source as logical switch 3220 and destination aslogical switch 3280, then passes the frame to the base switch 3230. Thetunnel driver 1760 of base switch 3230 encapsulates frames going tological switch 3220 to include an ENC header specifying source of baseswitch 3230 and a destination of base switch 3290, and an IFR headerspecifying the fabric ID of virtual fabric 3320. The frames are thenrouted across the base fabric 3340. Even though the frames pass throughbase switch 3260 in route to base switch 3290, no devices connected tobase switch 3260 sees that traffic. Upon receipt by the base switch3290, the frames are decapsulated by the tunnel driver 1760 of baseswitch 3290 to remove the ENC and IFR headers, before delivering theframes to logical switch 3280.

Similarly, if host 3140 requests data from storage system 3180, therequest and response can go over LISL 3370, traversing XISL 3360. Thetunnel drivers 1760 of base switches 3230 and 3260 encapsulate anddecapsulate the frames with ENC headers specifying the source anddestination base switches and IFR headers specifying virtual fabric3310.

If host 3160 requests data from storage system 3190, the request andresponse will go over LISL 3375, traversing XISL 3365. The tunneldrivers 1760 of base switches 3260 and 3290 encapsulate and decapsulatethe frames with ENC headers specifying the source and destination baseswitches and IFR headers specifying virtual fabric 3310.

If host 3160 needs data from storage system 3190, the traffic willtraverse LISL 3375 and XISL 3365, but not XISL 3360. As before, thetunnel drivers of base switches 3260 and 3290 will encapsulate anddecapsulate frames with ENC headers specifying the source anddestination base switches 3260 and 3290, and an IFR header specifyingvirtual fabric 3310.

Storage system 3195 is invisible to host 3140, as is any otherend-device connected to a logical switch assigned to a different virtualfabric than virtual fabric 3310.

Although some of the above description is written in terms of softwareor firmware drivers, the encapsulation and decapsulation can beperformed in hardware of the ASIC instead of software or firmware.

In one embodiment, end devices and logical switches may be in only asingle virtual fabric.

In one embodiment illustrated in FIG. 34, the functionality forpartitioning a network switch into a multiple logical switches describedabove is implemented in hardware as a 40-port Fibre Channel switch ASIC3410 that is combinable with a host processor subsystem 3420 to providea complete 40-port Fibre Channel switch chassis 3400. Multiple ASICs3410 can be arranged in various topologies to provide higher port count,modular switch chassis.

The ASIC 3410 comprises four major subsystems at the top-level as shownin FIG. 34: A Fiber channel Protocol Group Subsystem 3430, a FrameStorage Subsystem 3440, a Control Subsystem 3450, and a Host SystemInterface 3460. Some features of the ASIC 3410 that are not relevant tothe current discussion have been omitted for clarity of the drawing.

The Fibre channel Protocol Group (FPG) Subsystem 3430 comprises 5 FPGblocks 3435, each of which contains 8 port and SERDES logic blocks tosupport a total of 40 E, F, and FL ports.

The Frame Data Storage (FDS) Subsystem 3440 contains the centralizedframe buffer memory and associated data path and control logic for theASIC 3410. The frame memory is separated into two physical memoryinterfaces: a header memory 3442 to hold the frame header and a framememory 3444 to hold the payload. In addition, the FDS 3440 includes asequencer 3446, a receive FIFO buffer 3448 and a transmit buffer 3449.

The Control Subsystem 3450 comprises a Buffer Allocation unit (BAL)3452, a Header Processor Unit (HPU) 3454, a Table Lookup Unit (Table LU)3456, a Filter 3458, and a Transmit Queue (TXQ) 3459. The ControlSubsystem 3450 contains the switch control path functional blocks. Allarriving frame descriptors are sequenced and passed through a pipelineof the HPU 3454, filtering blocks 3458, until they reach theirdestination TXQ 3459. The Control Subsystem 3450 carries out L2switching, FCR, LUN Zoning, LUN redirection, Link Table Statistics, VSANrouting and Hard Zoning.

The Host System Interface 3460 provides the host subsystem 3420 with aprogramming interface to the ASIC 3410. It includes a PeripheralComponent Interconnect Express (PCIe) Core 3462, a DMA engine 3464 todeliver frames and statistics to and from the host, and a top-levelregister interface block 3466. As illustrated in FIG. 34, the ASIC 3410is connected to the Host Processor Subsystem 3420 via a PCIe linkcontrolled by the PCIe Core 3462, but other architectures for connectingthe ASIC 3410 to the Host Processor Subsystem 3420 can be used.

Some functionality described above can be implemented as softwaremodules in an operating system running in the host processor subsystem3420. This typically includes functionality such as the partitionmanager 1620 and the LFM 1610 that allow creation and independentmanagement of the logical switches that are defined for the ASIC 3410,including user interface functions, such as a command line interface formanagement of a logical switch.

Serial data is recovered by the SERDES of an FPG block 3435 and packedinto ten (10) bit words that enter the FPG subsystem 3430, which isresponsible for performing 8b/10b decoding, CRC checking, min and maxlength checks, disparity checks, etc. The FPG subsystem 3430 sends theframe to the FDS subsystem 3440, which transfers the payload of theframe into frame memory and the header portion of the frame into headermemory. The location where the frame is stored is passed to the controlsubsystem, and is used as the handle of the frame through the ASIC 3410.The Control subsystem 3450 reads the frame header out of header memoryand performs routing, classification, and queuing functions on theframe. Frames are queued on transmit ports based on their routing,filtering and QoS. Transmit queues de-queue frames for transmit whencredits are available to transmit frames. When a frame is ready fortransmission, the Control subsystem 3450 de-queues the frame from theTXQ for sending through the transmit FIFO back out through the FPG 3430.

The Header Processing Unit (HPU) 3454 performs header HPU processingwith a variety of applications through a programmable interface tosoftware, including (a) Layer 2 switching, (b) Layer 3 routing (FCR)with complex topology, (c) Logical Unit Number (LUN) remapping, (d) LUNzoning, (e) Hard zoning, (f) VSAN routing, (g) Selective egress port forQoS, and (g) End-to-end statistics.

FIG. 35 is a block diagram illustrating one embodiment of the HPU 3454of FIG. 34. To achieve per frame based processing with differentapplications, two lookup tables (3502 and 3504) are provided each withits own search engine (3506 and 3508, respectively) that performs keymatch search into different segments for different application appliedto the frame. One larger lookup table (3502) fits for all applicationsexcept the case of hard zoning, for which entries are stored in theother, smaller lookup table (3504).

The application type is determined by frame's DID upon receiving. TheHPU 3454 then picks up a key from the frame based on type of frame (L2or L3) and application, looks for a key match from the appropriatelookup table and processes lookup results after the search. Someapplications can be mixed with another as a combined processing. Forexample, if the frame's DID is destined to a remote fabric afterremapping, then the second lookup to translate the frame's DID isperformed by a loop-back mechanism within the HPU block 3454.

The HPU 3454 is partitioned into six sub-blocks that serve four majorfunctions including application determination, table lookup, routing andframe editing. FIG. 35 is block diagram illustrating these sub-blocksaccording to one embodiment. Upon receiving a frame, the action block(ACT) (3510) retrieves a frame header from switch memory and determinesthe type of application, and writes key information to key memory forlookup 3513. Then a Frame Transformation Block 3514 processes lookupresults and writes edit words into edit memories 3516 for later use by aframe editor (FED) 3518. If hard zoning is required, it is passed to theadvanced performance monitoring (ACL) block 3520 after routing is done.Depending on type of application, it may or may not require frameediting by the FED 3518. If no lookup is required, the frame is passeddirectly to the routing block (RTE) 3522 for normal Layer 2 switching,bypassing frame editing at the end.

The basic function of the ACT block 3510 is to process frame requestsreceived from the Sequencer (SEQ) 3446, capture relevant fields from theframe header, perform a look-up in the Action Table and forward theinformation to either the RTE 3522 or the FTB 3514.

The ACT block 3510 receives frame processing requests from the SEQ 3446.The ACT block 3510 then reads the frame header from the FDS, using theRxPort and DID fields of the frame header to determine the type ofprocessing required. If the L3 level (e.g. FCR) processing is required,then the ACT block 3510 forwards relevant frame header information tothe FTB block 3514. Otherwise, the information is forwarded to the RTEblock 3522. Frame information needed for Hard Zoning is also forwardedto the ACL block 3520 by passing a key information to a key memory forthe ACL block 3512.

The ACT block 3510 also performs Extended Link Service (ELS)/Basic LinkService (BLS) frame classification and forwards the required informationto the FTB 3514 and RTE 3522.

In summary, the HPU 3454 provides hardware capable of encapsulating androuting frames across inter-switch links that are connected to the ports3435 of the ASIC 3410, including the transport of LISL frames that areto be sent across an XISL. The HPU 3454 performs frame header processingand Layer 3 routing table lookup functions using routing tables whererouting is required, encapsulating the frames based on the routingtables, and routing encapsulated frames. The HPU 3454 can also bypassrouting functions where normal Layer 2 switching is sufficient.

Thus, the ASIC 3410 can use the HPU 3454 to perform the encapsulation,routing, and decapsulation, by adding or removing IFR headers to allowframes for a LISL to traverse an XISL between network switches in avirtual fabric as described above and illustrated in FIGS. 25 and 27 athardware speeds. Similarly, VSAN traffic can be routed by the HPU 3454'sencapsulation and decapsulation of frames with VFT headers, as describedabove and illustrated in FIGS. 28 and 29.

In conclusion, the embodiments described above provide the ability topartition a chassis into a plurality of logical switches, then assignports of the physical switch fabric to the logical switches. Virtualfabrics can then be defined across multiple chassis, with inter-switchlinks connecting the logical switches in the multiple chassis. Aparticular logical switch in each partitioned chassis is designated asbase switch, and collections of base switches form base virtual fabrics.

The links that connect switches in the virtual fabrics can be DISLsconnecting physical ports that are assigned to the logical switches,XISLs that connect physical ports of the base switches, and LISLs thatconnect logical ports defined in the logical switches. The LISLs have nophysical connection between endpoints of their own, but tunnel throughthe XISLs of their associated base switches.

Thus, devices can be connected to separate physical chassis, but behaveas if they are connected to a single virtual chassis. This allowsconnecting multiple collections of hosts and storage units in aflexible, convenient, and manageable way, while maintaining separationof traffic, so that each collection of devices in a virtual fabric isinvisible to the devices associated with the other virtual fabrics, evenwhen using a common physical XISL link for the transport of traffictunneled through the base fabric logical switches.

LISLs that connect logical ports of the logical switch by tunnelingthrough XISLs that physically connect base switches of the base fabric.

While certain example embodiments have been described in details andshown in the accompanying drawings, it is to be understood that suchembodiments are merely illustrative of and not devised without departingfrom the basic scope thereof, which is determined by the claims thatfollow.

1. A method of managing a network switch, comprising: partitioning afirst network switch into a first plurality of logical switches; andmanaging each of the plurality of logical switches independent of eachother of the plurality of the first plurality of logical switches. 2.The method of claim 1, wherein partitioning a first network switchcomprises: dedicating a resource of the first network switch to alogical switch of the first plurality of logical switches.
 3. The methodof claim 1, further comprising: isolating data traffic through a firstlogical switch of the first plurality of logical switches from the otherlogical switches of the first plurality of logical switches.
 4. Themethod of claim 1, further comprising: defining a link between a firstlogical switch of the first plurality of logical switches and a secondswitch; and communicating data between the first logical switch and thesecond switch.
 5. The method of claim 4, wherein the second switch is asecond network switch.
 6. The method of claim 4, further comprising:partitioning a second network switch into a second plurality of logicalswitches, wherein the second switch is a logical switch of the secondplurality of logical switches.
 7. A method of partitioning networkswitches, comprising: partitioning the first network switch into a firstplurality of virtual switch fabrics; partitioning the first networkswitch into a first plurality of logical switches; and associating thefirst logical switch with a first virtual fabric of the first pluralityof virtual switch fabrics.
 8. The method of claim 7, further comprising:partitioning a second network switch into a second plurality of virtualswitch fabrics; defining a multi-chassis virtual fabric, comprising thefirst virtual fabric of the first plurality of virtual switch fabricsand a second virtual fabric of the second plurality of virtual switchfabrics; and configuring the first virtual fabric and the second virtualfabric as a multi-chassis virtual fabric.
 9. The method of claim 8,further comprising: partitioning the second network switch into a secondplurality of logical switches; associating a second logical switch ofthe second plurality of logical switches with the multi-chassis virtualfabric; and communicating data between the first logical switch and thesecond logical switch across the multi-chassis virtual fabric.
 10. Anetwork switch, comprising: a switch, partitionable into a plurality oflogical switches, wherein each of the plurality of logical switches is acomplete and self-contained network switch; a processor; a storagemedium, connected to the processor; a chassis management system, storedon the storage medium, wherein the chassis management system, that whenexecuted by the processor causes the processor to perform actions thatare associated with the switch as a whole; and a logical switchmanagement system, stored on the storage medium, wherein the logicalswitch management system, that when executed by the processor causes theprocessor to perform actions associated with any of the plurality oflogical switches.
 11. The network switch of claim 10, wherein the switchcomprises a plurality of network resources, wherein each of theplurality of logical switches is assigned a network resource of theplurality of network resources, wherein a first logical switch of theplurality of logical switches is defined as a default logical switch,and wherein the default logical switch is assigned any of the networkresources not assigned to any other logical switch.
 12. The networkswitch of claim 11, wherein the chassis management system comprises: alogical fabric manager, configured to create and maintain a virtualfabric topology, comprising: a controller, configured to handle incomingevents; a fabric database, stored in the storage medium; a fabricdatabase manager, configured to store configuration information forvirtual fabrics in the fabric database; a logical topology database,stored in the storage medium; a logical topology manager, configured tostore topology information for each virtual fabric in the logicaltopology database; a logical link database, stored in the storagemedium; and a logical link manager, configured to store informationabout logical links associated with the plurality of logical switches inthe logical link database.
 13. The network switch of claim 10, whereinthe switch is partitionable into a plurality of virtual fabrics, whereineach of the plurality of logical switches is assigned to one of theplurality of virtual fabrics.
 14. The network switch of claim 13,wherein each of the virtual fabrics comprises: a virtual fabricidentifier, wherein the virtual fabric identifier is associated witheach of the plurality of logical switches to assign that logical switchto a virtual fabric.
 15. A computer readable medium on which is storedsoftware for partitioning a network switch, the software for instructinga processor of the network switch to perform actions comprising:partitioning the network switch into a first plurality of logicalswitches; and managing each of the plurality of logical switchesindependent of each other of the plurality of the first plurality oflogical switches.
 16. The computer readable medium of claim 15, whereinthe actions further comprise: defining a link between a first logicalswitch of the first plurality of logical switches and a second switch;and communicating data between the first logical switch and the secondswitch.
 17. The computer readable medium of claim 15, wherein theactions further comprise: partitioning a second network switch into asecond plurality of logical switches, defining a link between a firstlogical switch of the first plurality of logical switches and a secondswitch; and wherein the second switch is a logical switch of the secondplurality of logical switches.
 18. The computer readable medium of claim15, wherein the actions further comprise: partitioning the first networkswitch into a first plurality of virtual switch fabrics; and associatinga first logical switch of the plurality of logical switches with a firstvirtual fabric of the first plurality of virtual switch fabrics.
 19. Thecomputer readable medium of claim 15, wherein the actions furthercomprise: partitioning the first network switch into a first pluralityof virtual switch fabrics; partitioning a second network switch into asecond plurality of virtual switch fabrics; defining a multi-chassisvirtual fabric, comprising a virtual fabric of the first plurality ofvirtual switch fabrics and a virtual fabric of the second plurality ofvirtual switch fabrics; and associating a first logical switch of thefirst plurality of logical switches with the multi-chassis virtualfabric.
 20. The computer readable medium of claim 19, wherein theactions further comprise: partitioning the second switch into a secondplurality of logical switches; associating a second logical switch ofthe second plurality of logical switches with the multi-chassis virtualfabric; and communicating data between the first logical switch and thesecond logical switch across the multi-chassis virtual fabric.
 21. Anetwork comprising: a plurality of external devices; a plurality ofchassis, each comprising: a single-chassis fabric; and a switchconfigured for use with the single-chassis fabric; a first multi-chassisvirtual fabric coupling the plurality of external devices, wherein thefirst multi-chassis virtual fabric comprises: a first virtualsingle-chassis fabric to which are coupled a first portion of theplurality of external devices, the first virtual single-chassis fabricselected from a plurality of virtual fabrics configured from thesingle-chassis fabric of a first chassis of the plurality of chassis;and a second virtual single-chassis fabric to which are coupled a secondportion of the plurality of external devices, the second virtualsingle-chassis fabric selected from a plurality of virtual fabricsconfigured from the single-chassis fabric of a second chassis of theplurality of chassis; and a software stored on a storage medium of eachof the plurality of chassis, the software for instructing a processor ofthe corresponding chassis to perform actions comprising: partitioningthe single-chassis fabric of the chassis into a plurality of virtualsingle-chassis fabrics; associating a virtual single-chassis fabric ofthe plurality of virtual single-chassis fabrics with the multi-chassisvirtual fabric; partitioning the switch into a plurality of logicalswitches; and assigning a first logical switch of the plurality oflogical switches to the multi-chassis virtual fabric.
 22. The network ofclaim 21, the software for instructing the processor of thecorresponding chassis to perform actions further comprising: linking thefirst logical switch with a second logical switch of another of theplurality of chassis and assigned to the multi-chassis virtual fabric;and communicating data between the first logical switch and the secondlogical switch.